Concept Formation From Very Large Training Sets

نویسنده

  • Richard A. O'Keefe
چکیده

This paper proposes an a l t e r n a t i v e to Quin lan 's a lgor i thm fo r forming c l a s s i f i c a t i o n t rees from large sets of examples. My a lgor i thm is guaranteed to te rminate . Quin lan 's a lgor i thm i s usua l ly f a s t e r . I. The Nature of the Problem. We have a populat ion of objects which we want to c l a s s i f y i n to two** groups. We have a set of a t t r i b u t e s , each w i th a sna i l f i n i t e number of d i s t i n c t va lues , and a set of examples whose a t t r i b u t e s have been measured and which have al ready been c l a s s i f i e d . Our goal is to f i nd a r u l e , based on these examples, which we can use to c l a s s i f y other members of the popu la t ion . In general t h i s w i l l lead us to s t a t i s t i c a l methods such as c l us te r analys is ( E v e r i t t , 1974, Kenda l l , 1975, S t u r t , 1981a, S t u r t , 1981b). The l a rge r our c o l l e c t i o n of examples, the more l i k e l y i t is that some of them are m i s c l a s s i f i e d . ( S t u r t , 1981a) provides an exce l len t i l l u s t r a t i o n of how improving the f i t of a ru le to the t r a i n i n g set (beyond a ce r t a i n po in t ) can make It perform worse on the rest of the popu la t ion . Even so, there are i n t e r e s t i n g tasks where the domain is f o rma l , and we can be sure that we have a l l the in fo rmat ion we need and that our c l a s s i f i c a t i o n s are c o r r e c t . There a re , however, i n t e r e s t i n g problems where the domain is formal ra the r than r e a l , and we can be sure that a l l the re levant in fo rmat ion is ava i lab le and our c l a s s i f i c a t i o n s are c o r r e c t . Chess pos i t ions and a lgebra ic equations are two such domains. *Th is work was supported by a Commonwealth Scho larsh ip . Computing was done on the SERC Dec-10 under grant CR/C/20826 ** ID3 and ray a lgor i thm are both presented in terms of two ca tego r ies . The in fo rmat ion h e u r i s t i c genera l ises to any f i xed number of ca tegor ies , and so do the a lgo r i thms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IREP++, A Faster Rule Learning Algorithm

We present IREP++, a rule learning algorithm similar to RIPPER and IREP. Like these other algorithms IREP++ produces accurate, human readable rules from noisy data sets. However IREP++ is able to produce such rule sets more quickly and can often express the target concept with fewer rules and fewer literals per rule resulting in a concept description that is easier for humans to understand. The...

متن کامل

CNRS TELECOM ParisTech at ImageCLEF 2016 Scalable Concept Image Annotation Task: Overcoming the Scarcity of Training Data

We introduce our participation at the ImageCLEF 2016 scalable concept detection and localization task. As in ImageCLEF 2015, this edition focuses on generating not only annotations (concept detection) but also localizing concepts into a large image collection. In our runs, we focus mainly on concept detection; our solution is purely visual and based on deep features combined with standard linea...

متن کامل

Decision Tree Learning on Very Large Data Sets

Consider a labeled data set of 1 terabyte in size. A salient subset might depend upon the users interests. Clearly, browsing such a large data set to find interesting areas would be very time consuming. An intelligent agent which, for a given class of user, could provide hints on areas of the data that might interest the user would be very useful. Given large data sets having categories of sali...

متن کامل

Visual Concept Learning from Weakly Labeled Web Videos

Concept detection is a core component of video database search, concerned with the automatic recognition of visually diverse categories of objects (“airplane”), locations (“desert”), or activities (“interview”). The task poses a difficult challenge as the amount of accurately labeled data available for supervised training is limited and coverage of concept classes is poor. In order to overcome ...

متن کامل

A theoretical review of the concept of superego from the perspective of psychoanalytic approaches

The aim of this research was to investigate the concept of the superego from the perspective of psychoanalytic approaches, in particular the theory of Freud and Anna Freud and other theories of object relations (Klein and Bion) and the British Independent School (Fairbairn and Winnicott). Given the significant role of the pathological superego in psychological disorders, a better understanding ...

متن کامل

The Inefficiency of Batch Training for Large Training Sets

Multilayer perceptrons are often trained using error backpropagation (BP). BP training can be done in either a batch or continuous manner. Claims have frequently been made that batch training is faster and/or more "correct" than continuous training because it uses a better approximation of the true gradient for its weight updates. These claims are often supported by empirical evidence on very s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1983